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Abstract 

^ ^ Given a set S of n > d points in general position in W^, a random hyperplane split is obtained 

Ch by sampling d points uniformly at random without replacement from S and splitting based on 

^ ^ their affine hull. A random hyperplane search tree is a binary space partition tree obtained by 

recursive application of random hyperplane splits. We investigate the structural distributions 

of such random trees with a particular focus on the growth with d. A blessing of dimensionality 
I— I arises — as d increases, random hyperplane splits more closely resemble perfectly balanced splits; 

in turn, random hyperplane search trees more closely resemble perfectly balanced binary search 

trees. 

^ We prove that, for any fixed dimension d, a random hyperplane search tree storing n points 

Q has height at most (1 + 0(1/ \/d) ) logj n and average element depth at most ( 1 + O ( 1 /d) ) logj n 

' with high probability as n ^ oo. Further, we show that these bounds are asymptotically optimal 

I with respect to d. 

> 

1 Introduction 

Point sets in M.^ can be partitioned recursively by a number of possible trees. The early, and still 
most popular, choices are the k-d tree and the quadtree. The k-d tree takes a point from the set 
I and partitions the space into two sets with a hyperplane containing the point that is perpendicular 

7—i to one of the axes. In a quadtree, the split is into 2'^ quadrants obtained by shifting the origin to 

^ the point in question. A lot of ink has been spilled on the analysis of the shapes of the trees for 

random point sets — for a summary and mini-survey, see Devroye |9j. 

In this paper, we focus on deterministic point sets, outside the control of the user, and random 
^ partitions that are built on them. For example, in either of the two trees mentioned above, one 

could choose a splitting point uniformly at random, and make independent choices recursively on 
the subsets. We assume throughout that points are in general position (no three on a line, no four 
on a plane, and so forth). In both examples, if the set of data points lies on the moment curve 
{ (a;, x^, . . . , x"^) : x G M} , then the tree thus obtained is statistically equivalent to a random binary 
search tree. 

Analysis of random tree data structures typically focusses on two functions that quantify the 
level of balance: the depth (specifically, the mean point depth) and the height {i.e., the maximum 
point depth). These values are of particular practical importance — when searching for point in the 
tree, they correspond respectively to the average-case and worst-case query times. For a perfectly 
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Figure 1: A hyperplane search tree in M^: the point set and the hyperplane spHts (left) and the 
corresponding tree data structure (right). Internal tree nodes correspond to hyperplane splits and 
contain d data points each. External tree nodes correspond to cells and contain between and 
d — 1 data points each. 

balanced binary search tree, using and H* to denote the depth and the height, we have 

lim = lim = 1 , 

n->oo log2 n n-s>oo log2 n 

which is the best we can hope for when dealing with binary trees. 

A random k-d tree in any dimension has the same shape as a random binary search tree — 
notably, the distribution does not depend on the structure of the point set, only its size. We 
use Hn and Dn to denote the height and depth of a tree storing n points. It is known that 
Dn/ log2n ^ 2/log2 = 2.88539... in probability [22l[20l[7]. The limit law for Dn was derived 
by Devroye [7]: {Dn - 21ogn)/\/2 logn 4 AA(0, 1). Robson [31], Pittel |29j, and Devroye [3 E] 
showed that Hn/ log2 n — ^ 4.31107/ log 2 = 6.21956 ... in probability. See Mahmoud [23] for more 
background. 

1.1 Random hyperplane search trees 

For a given set S of n points in general position in M'^, a random hyperplane search tree is constructed 
as follows. If n > d, it selects at the root level d points uniformly at random without replacement 
from the n data points, and considers the hyperplane through these points, i.e., their affine hull. 
These d pivot points are associated with the root node and remain there. The hyperplane splits 
the remaining n — d points into two sets that are handled recursively and independently. If n < d, 
no splitting is applied, and all n points are associated with the root, which becomes a leaf. This 
construction guarantees that each internal node holds d data points and each leaf node holds 
between and d — 1 data points. See Figure [T] for an example. 

A key feature of hyperplane search trees is that they are constructed independently of the axes 
and are therefore robust to affine transformations of the underlying point set. If the set of points 
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contained by a k-d tree undergoes a rotation, the k-d tree would have to be reconstructed, however 
this is not the case for hyperplane search trees. 



Applications Hyperplane search trees have been used since the 1970s in many applications of 
statistics. For example, Mizoguchi et al. [26] highlighted their use in pattern recognition and You 
and Fu [36] considered their use as tree classifiers. Tree-based decisions in pattern recognition are 
popular because they take small computational efforts in terms of n. This is especially crucial 
when decisions must be made on-line, in real-time. Not only does the logarithmic behavior in n 
matter, but also the asymptotic constants. For an introduction to tree classification, see chapter 
20 of Devroye, Gyorfi and Lugosi [TD]- In computational geometry, trees based upon partitions of 
space by means of hyperplanes are ubiquitous. See for example the survey of Edelsbrunner and Van 
Leeuwen |13j . or the work of Haussler and Welzl [16j on simplex range queries. For more examples 
and references, see section 2 of our previous paper [TT] . 



1.2 Results 

Define Sn^ = {S : S d , \S\ = n, S is in general linear position}. For a set S S Sfi d we use 
H{S) and D{S) to denote the height and mean data point depth of a random hyperplane search 
tree built on S. H{S) and D(S) are random variables. By a trivial coupling argument we have 
that H{S) stochastically dominates D{S), i.e., F{D{S) <t} > F{H{S) < t} for any value of t. In 
order to more cleanly express bounds on H{S) and D{S), we define 

Cnid) =^ inf I c€R : lim maxp|-^^<cl = 1 1 , (1) 
i 5e-S„,d i log2 n j j 

Cnid) =^ inf I ceR : lim maxp|-^^<cl = 1 1 . (2) 
[ n^oo 5e5„,d [ log2 n J J 

The phenomenon that we wish to investigate is that uniformly over all sets S E the behavior 
of H{S) and D{S) is nearly optimal when d is large. It is already known [11] that C//(l) = 
ChC^) = 6.21956... and that Cnid) < Ch{^) for d > 3, thus showing that hyperplane search 
trees outperform random binary search trees or k-d trees for all dimensions, with the improvement 
being strict when d > 3. The present note makes this more precise, and shows that in fact, 
lim^^oo Cnid) = 1. Thus, by pushing up d, we can ensure almost perfectly balanced trees almost 
all the time, as we have the trivial lower bound H{S) > log2 n. We also derive expressions and 
bounds on Cnid) and Coid) as we proceed. 

It is natural to go beyond this result and ask how quickly random hyperplane search trees 
become perfectly balanced as d increases. We find that the constants corresponding to height and 
average depth decay at different rates. The main contribution of this paper is proving asymptotically 
optimal bounds for these rates, stated in the following two theorems: 

Theorem 1. 1. Cnid) = 1 + 0{d-^/^) . 

2. This bound is asymptotically optimal since there exists a function gnid) = 1 -|- such 
that 

lim max p| > gnid) 1=1. 

n-s>oo SGS„,d [ log2 n J 

Theorem 2. 1. Coid) = 1 + 0{d-^) . 
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2. This bound is asymptotically optimal since there exists a function goid) = 1 + Q(^d ^) such 
that 

lim max p| > goid) 

n-^oo Se5„,d [ log2 n 



1.3 Outline 

In Section [2] we examine random hyperplane search trees built on moment curve point sets. These 
point sets are conjectured to yield the most unbalanced random hyperplane splits. We discuss their 
connection with median-of-(2f + 1) trees and give simple, closed- form asymptotic lower bounds for 
the constants governing the height and depth of these trees. These provide the tightness parts of 
Theorems [T] and [2j 

In Section [3| we propose several simple lemmas that are good enough to provide tight asymp- 
totics for the height. In Section [5] we consider the height of dominated trees and prove Theorem [T] 
using our simple lemmas from Section [3} 

In Section [4] we introduce two lemmas providing simple and powerful bounds for the analysis of 
random split trees. The first lemma bounds the logarithmic moment of a class of random variables 
that often arise in the analysis of random split trees. The second lemma bounds the depth of a 
random split tree using a dominating split variable. We apply these lemmas to obtain an almost- 
tight depth bound using our simple geometric lemmas from the previous section. 

Finally, in Section [T] we introduce a stronger balance lemma proved by Wagner [32] and restate 
it in the language of this paper. Using this stronger balance lemma, we prove an asymptotically 
tight depth bound. 



2 Moment curve point sets and median-of-(2t + 1) trees 

The tightness parts of Theorems [l] and [2] can be shown for the moment curve data X = {xi, . . . , Xn} 
where 

•^i — ^ ! • ■ • ; ^ ) 

The points on the moment curve are parametrically ordered, and thus we can order them by first 
coordinate and refer to the points by their index between 1 and n. 

Analysis of random hyperplane splits on such point sets is quite clean. Choose d integers 
uniformly at random. This yields d + 1 (possibly empty) intervals into which the other points fall. 
Number the intervals. One side of the hyperplane corresponds to all odd-numbered intervals, and 
the other side to the even- numbered ones. If the intervals catch Ni,N2, ■ ■ ■ , N^+i points (with sum 
n — d, of course), then one subtree of the root has size A^i + + • • • data points, and the other one 
A''2 + A^4 + - • • data points. A statistically equivalent description yielding the same interval sizes uses 
n i.i.d. uniform [0, 1] random variables C/i, . . . Use Ui, ... ,Ud to define the d+1 uniform spacings 
of [0, 1], which we call Si, ... , S^+i. Then "throw" the remaining n — d points into the intervals. 
The cardinalities are distributed as (A^i, . . . , Nd+i), and are multinomial (n — d, 5i, . . . , Sd+i). The 
size of the odd side of the hyperplane thus is distributed as a sum of multinomial components — it 
is binomial {n — d, Si + Sz + ■ ■ ■ )■ It is well-known that uniform spacings are identically distributed 
and that their distribution is permutation-invariant (see, e.g., Pyke, 1965). Thus, the odd side of 
the hyperplane is of size distributed as a binomial {n — d, Si -\- S2 + ■ ■ ■ + >S'(d+i)/2) if d is odd and 
as a binomial {n — d, Si + S2 + ■ ■ ■ + 8(^^+2) /2) if d is even. But 5*1 + S'2 + • • • + 5fc is distributed as 
a beta {k,d+ \ — k) random variable. Thus, the root split for the moment curve data yields a left 
subtree that is binomial (n — d, beta((d + l)/2, (d + 1) /2)) when d is odd. For d even, we have with 
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Figure 2: The PDFs (left) and CDFs (right) of the hmiting spht distribution beta([d/2] , \d/2\) 
for d = 1,3, 7, 15. The distribution is uniform when d = 1 and becomes more tightly concentrated 
around 1/2 as d increases. 

equal probability a binomial (n — d, beta((d + 2)/2, (i/2)) and a binomial (n — d, beta((i/2, (d + 2)/2). 
One can verify that this is in turn distributed as a binomial (n — d, beta((i/2, (i/2)). If we wish 
to consider the fraction of points on one side of the hyperplane as n — )• oo for some fixed d, the 
expression is even cleaner — the limiting distribution is simply beta([(i/2] , [d/2]) (see, e.g., Devroye 
ig Lem. 2] or King [H §5.2]). 

This tree is indistinguishable from the fringe-balanced, or median-of-(2t + 1) search tree which 
has been studied quite extensively in the data structure literature. First suggested by Bell [3j and 
Walker and Wood [Mj , it is a binary tree constructed on real- valued data. It samples 2t -|- 1 data 
points uniformly without replacement from the n data points, where t is an integer. It then chooses 
the middle (median) element, and partitions the remaining data points into two sets by using this 
median point. Assuming without loss of generality that the data points are Ui, . . . , ?7„, as above, we 
see that the leftmost set in the split is precisely binomial (n — (2t + 1), beta(t-|- 1, t + 1)). Depending 
upon the implementation, the 2t unused pivot points can also be reused in the partition, thus 
inflating the subtree sizes by t each. For first-order asymptotics, this is an irrelevant choice. If they 
are not reused, then the median-of-(2t + 1) tree is distributed as the hyperplane search tree for the 
moment curve if we take odd d = 2t + 1. As we observed above, the moment curve hyperplane 
search tree for d = 2t + 2 is nearly identical, i.e., at least the beta components are of identical 
parameters. Thus, we will only consider odd d. 

2.1 A lower bound for the depth 

The depth Dn has been studied by the theory of Markov processes or urn models in a series of 
papers, notably by Poblete and Munro [30], Aldous et al. fl]. See also Gonnet and Baeza- Yates [HI 
p. 109] and Devroye [9], where a central limit theorem for Dn can be found. Poblete and Munro 
[30] showed that 

; — > — o5n-i T- '= Aft) in probability. 

logn y-2t+i 1 v; F J 

Here we give a clean lower bound for A{t) that proves the second half of Theorem [2| 
Proposition 1. For all t sufficiently large, 

Dn ^ log(3/2) 



log n log 2 At 
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Figure 3: A conceptual visualization of splits caused by choosing random points on the moment 
curve data set. The data are alternating above and below the hyperplane through the chosen 
points. 



Proof. We know that the nth partial sum in the harmonic series is 

where 7 = 0.57721 ... is the Euler-Mascheroni constant. Thus, the limit of Dn/logn is 
1 1 1 log(2) 



^os{W) + ^.-^. + 0{^) log(2)-i, + 0(^) log(2) At 

For odd d, this is 1/ log(2) + log(2)/(2(i) + ©(l/d^) , thus proving the second half of Theorem [2] □ 
The law of large numbers for the height is due to Devroye (1993). We have 

— > C{t) in probability, 

log n 

where C{t) is the unique solution c greater than A(t) of the equation 

2t+l 



A(c)-c log('l + ^') +clog2 = 0. 



i=t+l 

and A(c) is defined by the implicit equation 

1 2*+^ 1 

We have C{t) — )• l/log2 as t — )• 00. A table of numerical values is given in Devroye (1993). For 
example, for the moment curve in dimensions 1 and 2, the behavior is as for random binary search 
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trees: Dn/\ogn — )• 2 in probability and Hn/\ogn — )• 4.31107. . . in probability. In dimension 3, we 
have a beta(2,2) parameter in the split vector, and obtain Dn/logn — )• 12/7 in probability and 
Hn/ logn — >■ 3.19257 ... in probability. For d = 2, this is optimal as shown by Devroye, King and 
McDiarmid [Tl]. For d = 3, the moment curve yields indeed the worst point configuration, thanks 
to a result of Welzl [35] . 

To show the last part of Theorem [l| we need to show that C{t) > l/log(2) + c/\/i for some 
positive constant c and all t large enough. 

Proposition 2. For any constant c < Y^log(2) and all t sufficiently large, 

Proof. We reparametrize with respect to t as follows. Define A = a\/t and 1/c = log(2) — P/Vi. 
We plug this back into the definitions of A and c, and note that it suffices to show that as i — >• oo, 
/3 tends to a positive constant. First note that 



= log(2) + log(l-^)+o(l 

= .og(2)-^ + 0(i 



Thus, |q/2 — /3| = 0(l/\/t). The second equation relating A to c can be rewritten 

+ 1 - — log 1 + ^ = 0. 



clog(2) 

With the reparametrization, and dividing by this yields 

2t+l 



' i=t+i 



\og{2)^t) Vt log(^ 



Assuming a remains bounded, the last term is 

1 2t+l rr , 2t+l 2j. / 1 



log(2)Vt,i^, ^ log(2)Vt,i^^,2z2 \t^ 



""4tb^"4Vtiog(2) ^ Vt^y 
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Putting things together, our equation becomes 



+4+ / +o('-]=o. 



log{2)y/t ^/^ 4A/tlog(2) V* 

The main term is o{l/Vi) if 



a2 



a/3-log(2)- — =0(1). 
This happens if a — )• Y^log(16), and thus /3 — )• Y^log(2). □ 

3 Simple balance lemmas 

We need some prehminary results on polytopes. In particular, for a polytope P of with n 
vertices, let fk{P) denote the number of /c-faces. A special place is occupied by Cn^-, the cyclic 
polytope in having n vertices. As a canonical example of such a polytope we can consider the 
convex hull of the points {(t^, t^, . . . , t'^) : t = 1, 2, . . . , n|. 

McMullen's Upper Bound Theorem (McMullen, 1970; McMullen and Shephard, 1971) [Ml [25] 
states that for all 1 < /c < d — 1, 

max/fc(P) = fk{Cn,d)- 

For more on this, and alternate proofs, see, e.g., Mulmuley (1994) |27j . Ziegler (1995) [37J, or Kalai 
(1997) [18]. Exact expression are well-known for fk{Cn,d)- The one that is of most interest to us is 

This counts the number of full {d — l)-dimensional faces (i.e., facets) of Cn^d- For example, when 
n = d + 2, one can readily verify these formulas: 



J ^"^"^^^ when d is even , 

h^liCn^) = id±^d±^ ^^^^ ^ ^^^^ 



We also note (see, e.g., Griinbaum (2003) [13]) that if we are given n = d + 2 points in convex 
position and in general position, then their convex hull P is a simplicial polytope with n = d + 2 
vertices. Such a polytope must be combinatorially equivalent to Cn,d. in particular, fk{P) = fk{Cn,d) 
for all 1 < A; < d — 1. The following lemma is fundamental. 

Lemma 1 (The small balance lemma). Consider d + 2 points in general position in M"^. Let H be 
the hyperplane through d of them, chosen uniformly at random. Let A be the event that the two 
remaining points are on the same side of H . Then 

< - + ^ 



2 2(d+l) 

Proof. If the points are in convex position as well, then by remarks from the previous section 
(combinatorial equivalence with the cyclic polytope Cn,d for n = d + 2), we see that 
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{ 2^d+i) '^^^^ d is even, 
2^ when d is odd 

{5 + 2{d+i) '^^6^ d is even, 
5 + wlien d is odd. 

If the points are not in convex position, then d+1 oi them form a simplex, and one point is strictly 
inside it. We see that F{A} now equals the probability that if we choose d points uniformly at 
random, we fail to pick that interior point. Thus, 

F{A} = . 
^ ^ d + 2 

Combining cases, we note that 

¥{A} <l + 



2 2(d + l) 

□ 

Even though it is very simple, we already note that hyperplane splits in sets as small as d + 2 
are roughly balanced for large d. Next, we derive an inequality for hyperplane splits for general 
n > d. 

Lemma 2 (The balance lemma). Consider n > d + I points in general position in M*^. Let H be 
the hyperplane through d of them, chosen uniformly at random. This splits the remaining n — d 
points into two sets, S and S' . Let N = + |iS"|(l — where ^ G {0)1} is Bernoulli(l/2). Then, 
for X >0, 



n-d I (n-d)2 , o 
4 ~^ 4{d+l) 

Proof. When n = d + 1, then the upper bound is more than 1/2 when x < {n — d)/2. For 
X > {n — d)/2, the left-hand-side is zero. So assume n> d + 2. Let H denote the random set of d 
points (instead of the hyperplane that passes through them). For Xi ^ H, let A{xi,H) denote the 
event that Xi is at the same side of H as the origin (or any other arbitrary fixed point in general 
position with the others). Set Y{Xi,H) = 1 if A{xi,H) is true and Y{Xi,H) = —1 otherwise. 
Thus, 

Thus, E{N} = (n - d)/2, and 



Var{iV} = ( E ^(^^'^) 

= lA E ^'(^-^)[ + Z^| E Y{xi,H)Y{xj,H) 
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n — d (n — d)(n — d — 1)^,^^, rT\^^, tt\^ 
= -^ + - '-¥.{¥ {xz,H)Y{xw,H)} , 

where Z are randomly drawn without replacement from {xi, . . . , x„} \ H. Continuing, 

,^ r n — d (n — d)(n — d — 1) / . , n -.\ 

Var{Af} = — h ^ [2 ■ ¥[xz, xw are on same side of H] - . 

Now, after first conditioning on the set H\J {xz,xw}, which has cardinality d + 2, Lemma [T] gives 



us 

n — d (n — d)^ 
- 4 ^ 4(d+l) ■ 

By the Chebyshev-Cantelli inequality, we have 

n-d ] VarjAfj 

iV > + x} < ^ ^ 



Var{iV} + x2 ■ 

Plugging in the upper bound on Var{A^} gives the result. □ 

It is convenient to have a simpler bound than that of Lemma [2j in which the sample size n is 
removed. For example, this suffices for our main result: 

Lemma 3 (The simplified balance lemma). With notation from Lemma^ for x > 1/2 we have 

N ] ( \ 

> X > < min 



n - ]- V 2 ' 1 + 4((i+ - 1/2)2 

Proof. The 1/2 bound follows from the symmetry in the definition of N . We begin by formally 
replacing x in Lemma [2] by n[x — 1/2) + (i/2 = (n — d)[x — 1/2) + dx, and noting that this is 
> (n-d)(x- 1/2): 

n-d I (n-rf)^ 

FjiV > nx] < 



1 _|_ n—d 
4 A{d+1) 



l + 4m)+in-d)ix-l/2Y 

n+l 
4(d+l) 



n+l 
4(d+l) 

n + l 



+ {n-d){x-l/2y 



n + 1 + 4(d + l)(n - d){x - 1/2)2 

- n - d + 4(d + l)(n - d)(x - 1/2)2 
1 



l + 4(d+l)(x-l/2) 



2 ■ 



□ 
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4 Dominated binary trees 



We consider the following general set-up. A tree with 1 < n < d data points is not split and consists 
of a single node, the root, which "holds" all data points, if n > d, the root is split in some manner, 
resulting in left and right subtree sizes L, R, satisfying L + R = n — d, and {L/n, R/n) stochastically 
dominated by {Z,l — Z), where Z £ [0,1] is a given random variable symmetric about 1/2. By 
stochastic domination, we mean that 

P{max(L/n, R/n) > x} < P{max(Z, 1 - Z)n >x},x>0. 

From Marshall and Olkin (1979), we recall that for any convex function ip, 

E{iP{L/n) + ip{R/n)} < E{V'(Z) + V(l - Z)} = 2E{yj{Z)} . 

This splitting property is recursively applied to each subtree, and given a subtree size (like L), and 
given the data points that are in the subtree, we require the inequality uniformly over all point 
sets. To save space, we say that we have a tree dominated by Z . Let us give two examples. 

Example 1. In the random binary search tree, where d = 1, we know that L = R = [nU\, where 
U is uniform [0, 1]. It is trivial to show that {L/n, R/n) is stochastically dominated by {U, 1 — U). 
Thus the tree is dominated hy Z = U. □ 

Example 2. The hyperplane search tree in M'^. Let the largest of the two subtrees of the root have 
size N. By the union bound and Lemma [sj for x > 1/2, 

> X} < — — 7T < 



n- j - 1 + 4(d+ l)(x - 1/2)2 2(d + l)(x - 1/2)2' 
Let W E [1/2, 1] be a random variable with distribution function given by 



F{W > x} 

That is, W is supported on 
takes a moment to verify that 



ifxG 1/2 + Vl/2(d+l),l 



2{d+l){x-l/2) 

if x > 1. 



1/2 + y^l/2(d + 1), 1 and has an atom of weight 2/{d + 1) at 1. It 



W = — h mm 



2 \ 2' y 2(d+ 1)?7 

Thus, the Z in the preceding discussion can be taken 



Z = — \- a min i - , \ — ; — | , 

2 y2' ^ 2{d + l)U J ' 

where a G {—1, +1} is a random equiprobable sign. We will see a stronger domination result for 
hyperplane search trees further on. □ 
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5 Height of dominated trees and a proof of Theorem [T] 

We recall the notion of a tree dominated by and define 

Z* = max(Z, 1 — Z). 

Lemma 4. For constant 7 > 0, if 

mf.^(E{2Z-})%l, 

then 

lim F{Hn > [7logn]} = 0. 

n— >oo 

Proof. Let t = [7 log n] . By domination, we know that both subtrees of the root are stochastically 
not larger than nZ* . By repeating this observation as we descend away from the root following 
any path of length t, we deduce that the size of the subtree at that node is stochastically not larger 
than 



t 

n 
1=1 



where Z^,Z2, - ■ ■ is an i.i.d. sequence distributed as Z* . Therefore, by the union bound, and the 
fact that we have a binary tree, using Markov's inequality, and a constant A > 0, 



HHn >t} < 2*p|n]^Z; > d| 



d+1 



The upper bound is not more than 

log n 



which tends to zero if 

(^E|2Z*^|y < 1. 



□ 



Proof of Theorem [7} Let us take 



Z* = min Q + aVE + b, l) < ^ + aVE + b, 

where E is standard exponential and a, 6 > 0. Then 

e|(2Z*)^} < E|exp(2aAV^ + 6)} . 
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Choose A = l/(2a), and define p = E{exp(-v/-E' + &)}• Then 

(^E|2Z*^}y < exp (A + 7(log(2p) - A log 2)) < 1 

provided 

A _ 1 

^ ^ A log 2 - log(2/9) ~ log 2 - 2a log (2/9)' 

In particular, if a = 0(l/\/(i), then this, along with Lemma |4| would imply Theorem [l| But 
Example [2] implies that a hyperplane search tree is dominated by precisely such a Z* , with a = 
l/\/2d and 6 = logs. □ 



6 Logarithmic moments and depth of dominated trees 

The depth of a random node in a tree dominated by Z is determined by the logarithmic moment 

^l = 2E{Zlog(l/Z)} = ¥.{W\og{l/W) + (1 - M/)log(l/(l - W))] = E{y} 
where y is a random variable defined as follows: 



log (j^) with probability W , 
log ( ) with probability 1 — W . 



Y 

Note that since xlogx is bounded on [0, 1], > is bounded. Also, /i = if and only if Z G {0, 1}, 
i.e., Z is Bernoulli (1/2) (recalling that Z is symmetric). We first provide two useful general lemmas 
for computing a bound on the logarithmic moment and obtaining a one-sided law of large numbers 
for general Z. 

Lemma 5. For a random variable Z = 1/2 + aV , where V is [0, l/2]-valued, and a G { — 1, +1} is 
a random independent equiprobable sign, 

fi>\og2- aE{y2| ^ 

where a = 2(1 + ^/^og8f < 19. 
Proof. Note that 

-fi = e{ (i + V) log (1 + 1/) + (1 - 1^) log (i - } = - log 2 + E{f{2V)} , 

where 

j^^^ dcf (l + t;)log(l + i;) + (l-T;)log(l-t;) Q < ^ < ^ 

We check that /(O) = /'(O) = 0, f"{v) = 1/(1 — v)'^ > 0, so / is convex and increasing to 
/(I) = log 2. On [0, b], with 5 < 1, we have 

1 v"^ 

^ (136)2 X y 

On [6,1], we have f(v) < log(2) < (u/6)^log2. Combining this and choosing b = -y/log 8/(1 + 
\/log 8) , we see that 



/(u) < ^aw^O < < 1. 



□ 



13 



Lemma 6. In a random binary tree dominated by Z, having logarithmic moment > 0, we have 
for every e > 0, 

lim p/-^> i+4 =0. 
n-*-oo [logn /X J 

Proof. Let us begin with a small observation. Let A > be a parameter and let X < be a 
nonpositive random variable. Then 

' .AX ' 



limE<^ ^ } = E{X} . 

A^o A J 

This is best seen by noting that {e^^ — 1)/X>x, which provides a lower bound. Since (e^^ — 1)/A < 
0, we have by Fatou's lemma, 



limsupE.^ } < E< limsup } = E{X} . 

A4,o I A J \^ xio J 

Thus, as A I 0, 

(p{X) =^ E|e-^^} = 1 - AE{F} + o(A) = 1 - A// + o(A). 
Let us show by induction on the integers t that for all A > 0, n > 1, 

lP{^n >t}< n^{(p{\)Y,t > 0. 
Assuming this for a moment, then we have with t = \{l/fi + e) logn], 

P{-Dn >t}< n^(l -XiJi + o(A))* 

= [l-AA.e + o(A)f°s" 
= o(l) 

if we choose A > small enough but fixed. This would complete the proof. 

For the proof by induction, note that for t = 0, the inequality is trivial. So, we consider a 
general t > 0. Then denoting by X{L) and X{R) the subsets of data points that end up in the left 
and right subtrees of the root, and by Dl and Dr the depths of random nodes (relative to their 
subtree roots) of data points randomly selected from X{L) and X{R), respectively, then 

^{Dn >t} <eI -HDl > t - l\X{L)} + -HDr > t - l\X{R)} 

< e|^L^((/.(A))*-^ + ^R\^{X)y-^ 

<„*(.(A))-E{(^)"' + (f)"'} 

< n^((^(A))*-iE|z^+i + (1 - 

= n^((/:>(A))*-iE{e-^^} 
= n\^{X)y. 

□ 
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The inequalities of Section [3] are powerful enough to obtain a depth bound that is almost 
asymptotically tight. Our proof is of independent interest since it uses only Section [Sj the previous 
two lemmas, and simple textbook arguments. 

Proposition 3. Consider a hyperplane search tree for a collection of points xi,X2, . . . ,Xn S 
that are in general position. For fixed d, there exists a constant C{d) such that for all e > 0, 



lim sup ¥{Dn > (C(d) + e) logs n} = 0. 



Furthermore, as d ^ oo, 



C{d) = l + 



log(d) 



d 



Proof. Observe that in the definition of ;U, we can take 



1 



where 



By Lemma [Sj for this Z, 



, def 

V = mm 



1 



2{d+l)U 



> log 


;2- 




= log 


;2- 


aE| 


min 


= log 


;2- 


a 

-E. 


I min 


= log 


;2- 


a 


2{d+l) 



1 1 

4' 2{d + l)U 
2 

" {d + l)U 

1^1 ''^ + 1 
1 + log 



Combining Lemma [6] with this then completes the proof. 



□ 



7 Stronger bounds on (<A;)-facets and a proof of Theorem [2] 

Analysis of random hyperplane splits is directly related to the problem of counting /c-facets in 
discrete geometry. For a set of n points in general position in M'^, a subset of d points, along with 
an orientation, defines an oriented hyperplane with an associated positive open halfspace. If this 
halfspace contains exactly k of the remaining n — d points, we say that the oriented set of d points 
is a A;-facet. Thus each subset of d points defines, for some < A; < [{n — d)/2\, a /c-facet with one 
orientation and an [n — d — /c)-facet with the other orientation. A (<A;)-facet is simply a j-facet 
for some j < k. Knowing the probability mass function of N is equivalent to knowing the number 
of (</c)-facets for every < A; < [(n — d)/2\. For a thorough treatment of /c-facets we direct the 
reader to Wagner's 2008 survey [33j . 

A significant open conjecture in discrete geometry is the Spherical Generalized Upper Bound 
Conjecture, or SGUBC. Forms of this conjecture were proposed independently by Eckhoff |12j . 
Linhart [21], and Welzl [35]. Wagner [32l Conjecture 1.2] proposes the conjecture in full generality. 
Here we state a slightly weaker form of the conjecture in the language of this paper, which would 
be implied by SGUBC. 
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Conjecture 1. For a set of n > d points in general position in M.'^, define random variable N as 
the number of points on the larger side of a random hyperplane split. Define N* as the analogous 
random variable for the larger side of a random hyperplane split for the moment curve data in M'^. 
Then, for any x > (n — d)/2, 

¥{N >x}< F{N* > x} . 

The SGUBC conjecture is trivially true for d = 1. For d = 2, it was proved by Peck [28j and 
Alon and Gyori [2]. Welzl |35] proved Conjecture 1 for d = 3. Inequalities for the far right tail of 
were obtained by Clarkson and Shor [3]. For general d > 1, Wagner [32J proved a relaxed form 
of a conjecture closely related to the SGUBC that implies Lemma [7] below. 

Lemma 7 (Wagner). With N and N* defined as above, for any x > {n — d)/2, 

F{N > x} < 4 • F{N* > x} . 

Clarkson and Shor [1] proved a somewhat similar result many years earlier, but their bound only 
holds for the extreme tail of the distribution, i.e., as x approaches 1. It is therefore insufficient for 
our purposes. Wagner's bound, on the other hand, is valid for the entire range of x that concerns 
us. In order to exploit this result using the machinery of the previous section, we must first bound 
P{A^* > x} in an appropriate manner. 

Lemma 8. For a set of n distinct points on the moment curve in W^, we have 

— > x| < exp {-2d{x - l/2f^ - ^• 
Proof. After first fixing x > 1/2, for the sake of analysis we introduce random variables 

B = het&{\d/2'\,\d/2'\) , 

and ^d,x with a binomial {d,x) distribution. It is known (see Devroye ^) that F{N* /n > y} < 
P{max(S, 1 — B) >y} for all y > 0, i.e., N* /n is stochastically dominated by ma.x{B, 1 — B). It 
is also known that the c.d.f.'s of B and .j, are duals: for x G (0, 1), 



F{B > x} 

Thus, 



< (d-l)/2}, if d is odd, 

I <{d- 2)/2} + ^F{U,x < d/2} , if d is even. 



A^* 

>x}< P{max(5, 1- B)>x} 



n 



= 2F{B > x} 

< 2F{^d,x < [d/2\} 



( 2(dx - \d/2\V\ 
< exp ( ^ y ^' J (by Hoeffing's inequality jUj) 

2d2(x- 1/2)2^ 



exp 



d 

\2 



= exp (^-2d{x - l/2y 

concluding the proof. □ 
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Proof of Theorem^ Let E he a standard exponential random variable, let a G {— 1,+1} be a 
random equiprobable sign, and define 



Note that for 1 > x > 1/2, 



. I /^B + bg4\ 1 



N 

— >x} <4exp(-2(i(x-l/2) 



[max(Z, 1 — Z) > x} 



and thus the hyperplane search tree is dominated by this Z. 

The logarithmic moment of Z is easily bounded using Lemma [sj With a = 2(1 + log 8)^ we 
have 



1 S + log4 



/i > log 


;2- 


aE 


= log 


;2- 


qE 


= log 


;2- 


Ad 


> log 


;2- 


^] 
4d 


= log 


;2- 


a(] 



4' 2d 
d,2E + 2 

E{2£; + 21og4} 

flog 4) 
2d 



This implies that = (1 + ©(l/d)) log 2 and therefore Theorem [2] follows from Lemma [6j Thus, 
the sharper estimates for domination that flow from Wagner's inequality give the optimal rate of 
convergence with respect to d. □ 
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